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ABSTRACT 

6 : 

The Cambrian explosion is a grand challenge to science today and involves multidisci- 
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i i ; plinary study. This event is generally believed as a result of genetic innovations, environ- 
mental factors and ecological interactions, even though there are many conflicts on nature 

oo ' 

O ; and timing of metazoan origins. The crux of the matter is that an entire roadmap of the 

evolution is missing to discern the biological complexity transition and to evaluate the crit- 
ic ■ 

ical role of the Cambrian explosion in the overall evolutionary context. Here we calculate 

oo : 

the time of the Cambrian explosion by an innovative and accurate "C-value clock"; our 

> : 

result (560 million years ago) quite fits the fossil records. We clarify that the intrinsic 
reason of genome evolution determined the Cambrian explosion. A general formula for 
evaluating genome size of different species has been found, by which major questions of 
the C-value enigma can be solved and the genome size evolution can be illustrated. The 
Cambrian explosion is essentially a major transition of biological complexity, which cor- 
responds to a turning point in genome size evolution. The observed maximum prokaryotic 
complexity is just a relic of the Cambrian explosion and it is supervised by the maximum 
information storage capability in the observed universe. Our results open a new prospect 
of studying metazoan origins and molecular evolution. 
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INTRODUCTION 



The broad outline of Cambrian diversification has been known for more than a century, but 
only in the post-genomic era have the data necessary to explain the nature of the Cambrian ex- 
plosion. This problem originated in the disciplines of paleontology and stratigraphy, while the 
debate about it may be as old as the problem itself Q ED . Some ascribed the Cambrian explo- 
sion to intrinsic causes, while others believe that it may have been triggered by environmental 
factors. Innovative ideas exploded in the past decade with new fossil discoveries and progress 
in biogeochemistry, molecular systematics and developmental genetics H|[|5lll6llI3. However, 
we still need insights from other fields such as genome size evolution, self-organization, com- 
plexity theory and the holographic principle [81ll9ll [[T0l l [fTni to fully resolve this long-running 
problem. 

There is a profound relationship between the Cambrian explosion and the C-value enigma. 
Why did so many complex creatures appear in the late Neoproterozoic and Cambrian, but not 
earlier or later? We believe that the nature and timing of the Cambrian explosion can be de- 
termined by the evolution of genome size (see the schematic in Supplementary Figure 1). We 
invented a "C-value clock" to calculate the time of the Cambrian explosion based on genomic 
data. The basis of the C-value clock depends on the notion that the evolutionary relationship 
can be revealed by the correlation of protein length distributions and the genome size evolution 
can be taken as a chronometer. 

The start of our theory is a formula for evaluating genome size (namely C-value) of dif- 
ferent species. According to this formula, major component questions of the C-value enigma 
can be solved and the genome size evolution can be illustrated. Consequently, the genome 
size evolution can be taken as an accurate chronometer to study the macroevolution. We found 
a unique turning point in genome size evolution and calculated the time of the turning point, 
which corresponds to the Cambrian explosion. We believe that the Cambrian explosion was es- 
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sentially a major transition of biological complexity when the prokaryotic complexity reached 
its maximum value. We suggest that the biological complexity is supervised by the maximum 
information storage capability in the observed universe. 



RESULTS AND DISCUSSIONS 



Genome size evolution. Genome sizes vary extensively in or between taxa. We found that 
the genome size S can be determined by two variables: the noncoding DNA content 7] and 
the correlation polar angle 9. Hence we obtained an empirical formula of genome size for any 
contemporary species: 

7] 9 

S(r],9) = s exp( -), (1) 

a o 

where s = 7.96 x 10 6 base pairs (bp), a = 0.165 and b = 0.176 were obtained by least 
squares based on the data of S, i] and 9 for 54 species (see Supplementary Table 1 and 2). 
We also obtained another empirical formula of gene number N(r], 9) = 1.48 x 10 4 exp(g^ — 
oYgy ) and the relationship between non-coding DNA and coding DNA for eukaryotes log N nc = 
2.81 log N c — 12.5. The predictions of the formulae agree with the experimental observations 
very well (Fig. la, lb). The empirical formula of genome size is the start of our theory, which 
can be verified by many agreements between its predictions and experimental observations 
(especially the detailed agreements, Fig. 1, 3 and 4). 

The formula of genome size for contemporary species can help us write down the formula 
of genome size evolution from t = T = 3, 800 million years ago (Ma) (the beginning of 
life lfi~2l ) to t = (today). We introduced a function s(t) to describe the overall trend of the 
genome size evolution according to the distribution of species in the r\ — 9 plane. This is the 
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main assumption in our theory. We can distinguish two phases in genome size evolution (Fig. 
2a). In phase /, all the species in the lower triangle of the r] — 9 plane are simple prokaryotes 
and their non-coding DNA contents are low. In phase //, all the species in the upper triangle 
of the i] — 9 plane are eukaryotes, and the non-coding DNA content increased to the maximum 
value 77* . It is reasonable, therefore, to take the critical event that divides the two phases as the 
Cambrian explosion. 

Thus, we can obtain the formula of genome size evolution: si(t) = s\ exp(£/ri) for phase 
I and sn(t) = s 2 exp(t/r 2 ) for phase 77, where si = 1.98 x 10 7 bp, t\ = 644 million years and 
s 2 = 1.65 x 10 9 bp, r 2 = 106 million years (Fig. 2b). The result qualitatively agrees with the 
straightforward (but a little coarse) estimation of genome size evolution in Ref. lfi~3l in that (i) 
both genome size evolution increase exponentially (namely linearly in Fig. 2b) and (ii) there is 
a unique turning point in genome size evolution for our result or for the estimate (Fig. 2b). As 
expected, the dividing value of genome size in our theory sj(T c ) = sn(T c ) = s agrees with the 
maximum prokaryotic genome size in observation jH. 

Explanation of the C-value enigma. The C-value enigma is apparently concerned with the 
lack of correlation between genome size and morphological complexity but profoundly with the 
nature of the Cambrian explosion. According to the genome size formula, we obtained some 
general properties of genome size evolution, hence major questions of the C-value enigma can 
be explained. 

According to the genome size evolution formula, we can distinguish two speeds of genome 
size evolution. In phase /, the genome size doubled in about every 466 million years on the 
whole. And in phase II, the genome size doubled in about every 73 million years on the whole. 
So, the speed of genome size evolution for phase II (mainly non- coding DNA increasing) 
is much faster than that for phase / (mainly coding DNA increasing). The pattern of expo- 
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nential increment can be simply understood by the relation As(t) oc s At for the two phases 
respectively. The overall picture of the genome size evolution reflects the entire roadmap of the 
biological complexity evolution, which is helpful to understand the macroevolution. 

The Cambrian explosion can help to account for the genome size ranges in taxa. All phyla 
appeared almost simultaneously in the Cambrian explosion. In the evolution, therefore, i] in- 
creases from fj to i]* for each phylum (Fig. 2a). The genome size in a phylum varies by about 
A = lgexp ~ 2.4 orders of magnitude (Fig. 3). The history of a class is generally shorter 
than that of a phylum. So the genome size range in a class is less than that in a phylum, which 
varies by about 5 = lgexp 4^ ~ 0.5 orders of magnitude (Fig. 3), where the uncertainty A9 is 
estimated by 0.2 (Fig. 2a). Furthermore, we can explain the lack of correlation between genome 
size and morphological complexity. The origin of phyla in the Cambrian explosion related to 
the appearance of kernels of gene regulatory networks, whose complexity varied notably. But 
the C- values of species in different phyla did not vary notably 0. So the discrepancy between 
genome size and eukaryotic complexity happened from scratch (Fig. 3). 

Three clusters of prokaryotes Coram-, Ccram+ and C sma ii can be distinguished in the lower 
triangle of the r\ — 9 plane (Fig. 2a), where Gram negative bacteria, Gram positive bacteria and 
bacteria with small genome size are in the majority respectively |[i~4ll . We evenly distributed 
6038 dots (representing "species") in three symmetric areas enclosing Coram-, CGrom+ an d 
Csmaii in Fig. 4a (the same areas with Fig. 2a). After projecting the three symmetric areas in 
plane by the non-linear transformation Eqn. 1, we obtained three asymmetric areas C' Gram _, 
C' Gram+ and C' small in rj — s plane in Fig. 4c. Finally, we obtained the prokaryotic genome size 
distribution in Fig. 4b by counting the numbers of species in each genome size section with 
identical width in Fig. 4c. 

Timing of the Cambrian explosion. The time T c for the Cambrian explosion can be calculated 
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according to the formula of genome size evolution. The function sj(t) represents the coding 
DNA evolution. Its extrapolated value sj(0) = si represents the size of coding DNA at present. 
And the value sj(0) = s 2 represents the total genome size at present. For the coding DNA 
content at present, we obtained an equation between the experimental data and the theoretical 
prediction 1 — r/* = S\/s2, where s\ and s 2 are functions of T c . According to this equation, we 
have 

T = To{l - (-^-ln(l - /?*) + + I)" 1 ) = ftf). (2) 

1—1] a l-i) 

This is the formula to calculate the Cambrian explosion time by C-value clock, which radically 
differs from molecular clock estimates (Fig. 2c) |[T5l lfT6ll . The value t]* should be of the 
species whose i] is the largest and whose complexity is the greatest. The best choice is no 
other than human: if = 0.988 ifTTl [fT8l . Therefore, we obtained the Cambrian explosion time 
T c = f (0.988) = 560 Ma. Our result agrees with the fossil records very well (Fig. 2d). 

This main result of C-value clock shows that the Cambrian explosion corresponds to a turn- 
ing point in genome size evolution. It is for the first time, to our knowledge, to successfully 
mediate timing of the Cambrian explosion between paleontology and molecular biology. Con- 
sidering the sensitive relationship between T c and rf . (Fig. 2d), it is remarkable to calculate 
almost the exact time of the Cambrian explosion by the non- coding DNA content of human 
genome. The subtle relationship T c = f(r)*) indicates the close relationship between the rapid 
expansion of noncoding DNA and the cause of the Cambrian explosion. The genetic mechanism 
can give us a clear and in-depth understanding of the Cambrian explosion. Both development 
and evolution of the animal body plans should be studied at the level of gene regulatory net- 
works J51 lfT9ll . The appearance of genomic regulatory systems may be a prerequisite for the 
animal evolution. And the phylum- specific or subphylum-specific kernels of gene regulatory 
networks may explain the conservation of major phyletic characters ever since the Cambrian 

EH. 
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According to Eqn. 2, we obtained 4^ = -75^1 + ^h. _ 5.2^ + 0.85^ - 0.57^. 
The error of T c in prediction, therefore, mainly comes from the parameter if . If considering 
the uncertainty in human gene prediction, the error of coding DNA content in human genome 
is about 10% ifTTl . Hence we obtained that the value of T c in prediction ranges from 502 Ma 
to 560 Ma. Even if the databases of complete genomes and proteomes may expand much in 
the future, the parameters in Eqn. 1 would change slightly. So our main results in this paper 
will still be valid. By the way, if choosing T c as the date of the earliest known microfossils, 
i.e., T = 3, 500 Ma, the prediction would be T c = 516 Ma. There is a notable discrepancy 
between the molecular clock estimates and the fossil records [fT6l [|20l fl2TTl . Obviously, the 
C-value clock works better than the molecular clocks for this problem. We can conclude that 
the C-value clock estimate agrees with the fossil records in principle (Fig. 2d). 

If comparing the time of evolution of life as a day, why did not the complex life appear 
in the morning or in the afternoon but appear around half past eight in the evening? In terms 
of the overall picture of genome size evolution in Fig. 2b, we can explain why the simple life 
had actually predominated on the planet for the first 6/7 time in the evolution. It is due to 
that the evolutionary speed for non-coding DNA is much faster than that for coding DNA. The 
Cambrian explosion can not happen in the first half of the period in the evolution. The reason is 
that si is always less than s 2 such that the turning point had to appear later than the time To/2. 
Furthermore, it can be illustrated that the Cambrian explosion must happen very late because s 1 
is in fact much less than s 2 at present, namely, the slope for the evolution of non-coding DNA 
is much steeper than the slope for the evolution of coding DNA (Fig. 2b). 



Nature of the Cambrian explosion. The formula of genome size evolution opens up an op- 
portunity to investigate the entire roadmap of evolution based on biological complexity. It is 
observed that the biological complexity increases faster and faster but not smoothly [|22l ll23l 
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E4l . The pattern that mass extinctions followed by rapid evolutionary radiations is widely con- 
sidered to have fundamentally shaped the history of life. But it is not the answer to the case 
of the Cambrian explosion. The evolution is not only a mixture of accidental events. The one 
with less perseverance can never spend billions of years to assemble a jaguar by quarks! An 
overall mechanism of the evolution is required to explain the Cambrian explosion. The genome 
size evolution is just a problem on macroevolution. In our theory, the function s(t) represents 
not only the trend of the genome size evolution but also the trend of the biological complexity 
evolution because the prokaryotic complexity is related to the genome size and the eukaryotic 
complexity is related to the non-coding DNA content lfT8l . The turning point in genome size 
evolution implies that there was a critical value of biological complexity in evolution, which is 
supported by the fact that both the genome size and the complexity of prokaryotes have never 
reached the size and complexity of eukaryotes. The constraint of the prokaryotic complex- 
ity demands a leap in biological complexity. As a result, the complex organisms successfully 
bypassed this constraint during Cambrian. 

Several attempts have been proposed to explain the maximum prokaryotic complexity E51 
ll26l jH. Its existence can be explained by the theory of accelerating networks ll27l . It is sug- 
gested that prokaryotic complexity may have been limited throughout evolution by regulatory 
overhead, and conversely that complex eukaryotes must have bypassed this constraint by novel 
strategies ll25l ll22l . We give another explanation based on Kauffman's theory and the holo- 
graphic principle JH ifTTI GBl . The theory of self-organization provides deep insight into the 
spontaneous emergence of order which graces the living world [9J. The prokaryotic complexity 
should be understood as a dynamical system at the level of gene networks. So we can define 
prokaryotic complexity by information stored in Boolean networks, which is so immense that it 
can reach the maximum information content I univ in the observed universe. Holographic bound 
in physics imposes a strict limit on the biological complexity. The information bridges between 
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biology and physics [|29ll 113011 . We believe that the maximum prokaryotic complexity is con- 
strained by the upper limit of information storage capacity in our universe. Hence the maximum 
complexity of accelerating networks in the above explanation can be given concretely. 

The Cambrian explosion of animal phyla radically differs from all the other radiations such 
as the radiations of modern birds and mammals in the early Tertiary, because it corresponds to 
the unique critical event in the genome size evolution. The intrinsic reason of genome evolution 
determined the Cambrian explosion, during which the biological complexity leapt not only at 
the anatomical level but also at the molecular level. The stability of the genomic system became 
low before the Cambrian explosion because the old mechanism of evolution was suffocated. At 
this critical moment, any extrinsic factors were qualified to turn the evolution to a new direction. 
Numerous complex animal body plans were destined to come at a certain time. In contrast, the 
causes of other radiations were full of uncertainty. The nature of the Cambrian explosion must 
be studied in a broader context than before. The Cambrian explosion and the origin of life were 
the most important events in the evolution from nonliving systems to living systems. We believe 
that the C-value enigma and the Cambrian explosion will help us uncover the intricate mecha- 
nism in evolution. A multidisciplinary framework has been established in our work to explain 
the Cambrian explosion (see Supplementary Figure 1), which will shed light on the essence of 
evolution. 



METHODS 



The definition of correlation polar angle 9 and its biological meaning. The correlation polar 
angle indicates the evolutionary relationship, whose role in the C-value clock is as important as 
the role of sequence similarities in molecular clocks. The correlation polar angle can be defined 



according to protein length distributions, which helped in discovery of the formula of genome 
size when we fortunately realized the relationship between genome size S and the correlation 
polar angle 9. In the followings, we define the correlation polar angle firstly. Then we explain 
its biological meaning. 

The protein length distribution is an intrinsic property of a species, which is defined as a 
distribution (namely a vector) D = (Z?i, D 2 , D n , ...): there are D n proteins with length n in 
the complete proteome of the species. Our data of the protein length distributions are obtained 
from the data of 106 complete proteomes in the database Predictions for Entire Proteomes ll32l . 
The normalized vector of protein length distribution d is defined by the direction of vector D: 



Because there are few proteins longer than 3000 amino acids in a complete proteome (Supple- 
mentary Figure 2c), we can neglect them and set the length 3000 as the cutoff of protein length 
in the calculation. Hence both D and d are 3000-dimensional vectors. Thus each species cor- 
responds to a point on the 3000-dimensional unit sphere (Supplementary Figure 4a). The polar 
axis of the spherical coordinates (Supplementary Figure 4a) can be defined by the direction of 
the vector of the total protein length distribution of the 106 species (Supplementary Figure 2c) 



And we denote the normalized vector of Z as the unit vector z of polar axis, the corresponding 
point of which situates at the center of the swarm of 106 points on the unit sphere (Supplemen- 
tary Figure 4a). The correlation polar angle 9 of a species is defined by the polar angle of the 
corresponding vector of protein length distribution: 




Z 



i£l06 species 



9 = — arccos(d • z), 



7T 



where the factor - is added in order that the value of 9 ranges from to 1. 

7T ° 
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The biological meaning of the correlation polar angle can be interpreted as the average 
evolutionary relationship between an species and all the other species (Supplementary Figure 
3b). The less the value of 9 is, the closer the average evolutionary relationship is. This inter- 
pretation is based on the following two considerations: (1) Let vectors d(i) and d(j) corre- 
spond the protein length distributions of two species i and j (Supplementary Figure 4a). The 
correlation between the two protein length distributions can be defined by their inner product 
Cij = d(i) ■ d(j). Hence we obtain the correlation matrix (CV,) (Supplementary Figure 3a). 
We can see that the evolutionary relationship is closely related to the correlation between the 
protein length distributions. The correlation polar angle 9 for species % can be interpreted as the 
average evolutionary relationship according to (compare Supplementary Figure 3a and 3b): 

cos (^) = E d(<) • D(j)/v^Z) 

jSl06 species 
jr'Gl06 species 

where % = | arccos(C^) is the correlation angle between two species and 
w(j) = a/D(j) • D(j)/VZ • Z is the weight for species j in the summation. (2) An auxiliary 
polar axis z' can be defined by another direction differed from the polar axis. For example, we 
chose the direction corresponds to the distribution in Supplementary Figure 2d, hence the aux- 
iliary polar angle is defined by = | arccos(d ■ z') (Supplementary Figure 4a). Then the high 
dimensional unit sphere (dim=3000) can be projected to a two dimensional 9 — <fi plane, where 
eukaryotes, archaebacteria and eubacteria gather together in three areas respectively (Supple- 
mentary Figure 4b) and the closely related species also form clusters in the 9 — plane. So the 
correlation polar angle is a useful tool to study the evolutionary relationship. The conclusion is 
still valid if we choose other directions as the auxiliary polar angle. 

Derivation of Eqn. 1: the genome size S(r], 9). We found that In S decreases linearly with 
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6 (Supplementary Figure 5a) but increases linearly with i] (Supplementary Figure 5b) on the 
whole. Hence, we wrote down the relation: 

mb = In So H -• 

a b 

According to the biological data of genome size, n and 6 (Supplementary Table 2), we obtained 
the empirical formula of genome size Eqn. 1 and its coefficients a, b and s by least squares. 
Similarly, we obtained the gene number formula 

N(r l ,9)=n exp(^ - £). 

a' a 

The value of i] varies little for prokaryotes in both formulae and b »s b', so the genome size is 
approximately proportional to the gene numbers: 

^^exp(r / (i-l)) = 842, 
JM no a a 

which is near to the ratio in observation [8]. But such linear relationship is destroyed for eu- 
karyotes because of the vast variation of r\. 



The relationship between non-coding DNA N nc and coding DNA iV c for eukaryotes. The 

average protein length for eukaryotes is about 450 amino acids, so the logarithm of coding DNA 

for eukaryotes is about log N c = log (3 x 450 no) + 4 — % according to the gene number formula. 

And the logarithm of non-coding DNA is about log N nc = log so + 2 — | + log 7/ according to 

the genome size formula. So we have 

o! q! 11 ti o! 

\ogN nc = — log iV c + log s log(1350n ) - t + 77— + 1 og 7 7 

a a b u a 

w 2.81 log A^ c - 12.5, 

where we let log 77 ~ log 0.5 in calculation. According to the experimental observation (Figure 
1 in Ref. 113310 , we obtain the relationship logA^ = 2.82 log N c — 12.8 between non-coding 
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DNA and coding DNA for actual species on the whole if choosing two points (6.8, 6.4) and 
(7.9, 9.5) in Figure 1 in Ref. [|33l to determine the linear relationship. Our result agrees with 
the experimental observation perfectly. 

The genome size evolution function s(t). We can observe a right-angled distribution of the 
contemporary species in the r] — 9 plane (Fig. 2a). The prokaryotes and the eukaryotes are 
separated by the diagonal line r] = 9. An underlying mechanism of genome size evolution 
is necessary to account for the distribution. Some species originated earlier while the other 
originated later. As a result, the distribution of species in the rj — 9 plane has recorded the 
information of genome size evolution. Hence we can write down the genome size evolution 
function. 

The prokaryotes situate around the horizontal line r\ = fj = 0.115, where fj is the average of 
7] for 48 prokaryotes (see Supplementary Table 1 and 2). According to Eqn. 1, the trend of the 
genome size evolution for prokaryotes increases when 9 decreases. When 9 is close to 1, there 
is few species because the genome size is too small as for the contemporary species. On the 
other hand, the eukaryotes situate around the vertical line 9 = fj and the trend of their genome 
size evolution increases when r) increases. 

We introduced a function s(t) to describe the overall trend of the genome size evolution 
according to the right-angled distribution in observation, whose turning point corresponds to 
the largest genome size of prokaryotes (Fig 2a). It is reasonable to define that the genome 
size evolution function s(t) evolves leftwards along the horizontal line 77 = fj and consequently 
upwards along the vertical line 9 = fj in the 77 — 9 plane. This definition of genome size evolution 
function agrees not only with the right-angled distribution of species in the r] — 9 plane but also 
with the trend of the genome size evolution from small to large on the whole. 
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Derivation of Eqn. 2: the Cambrian explosion time T c . In phase I, 77 (t) = 77, and 6{t) 
decreases linearly from 1 to fj, i.e., 6{t) = 1 — So we have 

Mt) 0(t). , . , 

s/(t) = s exp( — ) = siexp(*/n), 

a 

where si = s exp(| — ^fF^fry) and n = 6 ^L?° ■ Incidentally, we have s' = s/(T ) = 
s exp(^ — |). And in phase I, = r7 and r](t) = 77* — (77* — fj)t/T c . So we have 

sii{t) = s exp( — ) = s 2 exp(t/r 2 ), 

a 

where s 2 = s exp(^ — |) and r 2 = ■ Finally, substituting the expressions of si and s 2 
into the equation 1 — rf = si/s 2 , we obtained Eqn. 2. 



Upper limit of the prokaryotic complexity. Boolean networks have for several decades re- 
ceived much attention in understanding the underlying mechanism in evo-devo biology 
[|34l . We define the network Nl as a Boolean network whose nodes are all possible protein 
sequences with the length less than L. The size of state space of N L is ~ 2 2 ° L in that N L 
has about 20 L nodes. According to Shannon's theory, the information stored in this network is 
I net ~ log 2 2 2 ° L = 20 L bits (Supplementary Figure 6). Types of prokaryotes can be interpreted 
by attractors of the Boolean network Nl, which are robust against perturbations in evolution 
[|34l . An actual genome of an organism can be denoted by one point amongst the total ~ 2 2 ° L 
points in the state space of Nl- Based on the consideration that the biological complexity should 
be evaluated at the level of gene regulatory networks, the prokaryotic complexity can be defined 
by the information I net stored in Nl- Its value is much greater than the information stored in the 
genetic sequences; the latter is not sufficient to measure the biological complexity for overlook- 
ing the complexity at the level of gene networks. This definition does not apply to eukaryotic 
complexity, which may involve RNA regulations [|22l. 
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We can show that the constrained maximum complexity of unicellular organisms can be ex- 
plained by the upper limit of information stored in the finite space. There was a great achieve- 
ment in the knowledge of fundamental laws in nature, which originated in the field of quantum 
gravity ll35l ll36ll ll2~8l . It claims that the information storage capacity of a spatially finite system 
must be limited by its boundary area measured in fourfold Planck area unless the second law 
of thermodynamics is untrue. Consequently, we can obtain the maximum information storage 
capacity in the observable universe as I univ ps 10 122 bits [|37l . which is a strict limit on the in- 
formation content not only for physical systems but also for living organisms. Let I net ~ I U niv> 
we obtained L ~ 94 amino acids, which dramatically corresponds to the most probable pro- 
tein length for prokaryotes (Supplementary Figure 2b) [|38l . So the information stored in the 
prokaryotic gene networks is so large as to be comparable to I un i V . Thus we have demonstrated 
the equivalence between the prokaryotic complexity and the information content I univ in our 
universe. We might say that what kind of spacetime determines what kind of life. A certain vast 
spacetime is necessary to accommodate the immense information stored in life. 

We are grateful to Hefeng Wang, Lei Zhang, and Yachao Liu for valuable discussions. Sup- 
ported by NSF of China Grant No. of 10374075. 
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Figure 1: Comparison between predictions and observations for genome size and gene 
number. Our results quite fit the experimental observations not only for prokaryotes but also 
for eukaryotes. a, Genome size (correlation coefficient r = 0.974). b, Gene number (correlation 
coefficient r = 0.976). 
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Figure 2: Genome size evolution and the nature and timing of the Cambrian explosion, a, 

The distribution of species in 9 — i] plane and the function of genome size evolution, b, The 
turning point of genome size evolution (red: total genetic DNA and blue: coding DNA). Our 
result (solid lines) is supported by the coarse estimate (thick dotted lines, data for estimate time 
and genome size for 5 taxa are obtained from Ref [13])). c, Comparison between the molecular 
clock and the C-value clock, d, A sensitive relationship T = f(rj*). If varying 77* a little, T c 
will change much. The value of T c ranges approximately from 502 Ma to 560 Ma according to 
the C-value clock estimate. The result by C-value clock agrees with the fossil records [3] better 
than the molecular clock estimates lfT6ll [fT51 . There should be notable systematic errors in the 
usual method of molecular clock estimates. 
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Figure 3: Explanation of C-value enigma: genome size range and eukaryotic complexity. 

The ranges in genome size by order of magnitude (A ~ 2.4 for phyla and 5 ~ 0.5 for classes) fit 
the experimental observations in general (see Fig. 1 in Ref. llBTTl ). In observation, the genome 
sizes of majority phyla also vary by about 2 magnitudes and the genome sizes of majority classes 
vary by less than 1 magnitude flU. The complexity of a species inherits from the complexity 
of the corresponding phylum in general, so the complexity of species A in a more complex 
phylum can potentially outstrip the complexity of species B in a less complex phylum, though 
the genome size of A is much less than that of B. 
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Figure 4: Explanation of C-value enigma: prokaryotic genome size distribution, a, Evenly 
distributed dots (representing "species") in three symmetric areas in 9 — rj plane, b, The pre- 
diction of prokaryotic genome size distribution quite fits the experimental observation. The 
principal characters of two peaks and their ratio in height and even the detailed characters such 
as shoulder, saddle and slope are almost the same in the actual genome size distribution in Fig. 
10.12 in Ref. [ 8 ]. c, The prediction of prokaryotic distribution in s — r] plane (three asymmetric 
areas enclosing by lines) quite fits the intricate distribution of prokaryotes (green dots). 
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SUPPLEMENTARY INFORMATION 

• Supplementary figures 1 ~ 6 

• Supplementary tables 1 ~ 2 
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genome size formula (Eqn. 1 ) [Fig. 1 e 



genome size evolution formula as C-value clock [Fig. 2b, 2c] 



Cambrian explosion time formula (Eqn. 2) [Fig. 2d] 




gene number formula [Fig. 1 b] 
*■ relation between non-coding and coding DNA 
genome size distribution [Fig. 4b] 
distribution in s-n. plane [Fig. 4c] 
range of genome size [Fig. 3] 
C-value vs. eukaryotic complexity [Fig. 3] 



C-VALUE ENIGMA 



TURNING POINT in genome size evolution [Fig. 2b] — >■ kernals of gene regulatory networks 



, origin of phyla CAMBRIAN EXPLOSION 



observed maximum prokaryotic complexity 
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Figure 5: Supplementary Figure 1: The multidisciplinary framework to explain the Cambrian 
explosion. We found the close relationship between the C-value enigma and the Cambrian explosion. 
Hence we invented a new method of C-value clock depending on the empirical formula of genome size. 
The unique turning point in genome size evolution corresponds to the critical event of the Cambrian 
explosion. The constraint on the unicellular genome evolution resulted in the upper limit complexity 
of unicellular organisms. We believe that the limited information storage capacity may determine the 
complexity of gene networks. The origin of life and the Cambrian explosion were the most important 
milestones in the evolution of biological complexity from nonliving systems to living systems. 
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Figure 6: Supplementary Figure 2: Protein length distributions, a, An example of protein length 
distribution of E. coli. b, Total protein length distribution for prokaryotes. c, Total protein length distri- 
bution for all the species in database PEP, which can be taken as the polar axis z in the Supplementary 
Figure 4a. d, An outline of the protein length distribution, which can be taken as the polar axis z' in the 
Supplementary Figure 4a. 
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Figure 7: Supplementary Figure 3: The evolutionary relationship can be revealed by the correla- 
tion between protein length distributions, a, The correlation matrix (C^) represent the evolutionary 
relationship between any pairs of species i and j among the 106 species. The species in the matrix are 
ordered by the average protein length from short to long for archaebacteria, eubacteria, virus and eukary- 
otes respectively. The species can be given concretely by the serial number in Supplementary Table 1 
from the 1st position to the 106th position in the correlation matrix: 3, 8, 84, 65, 66, 83, 95, 51, 64, 82, 
96, 63, 87, 9, 104, 49, 10, 40, 31, 93, 76, 91, 45, 94, 78, 57, 21, 90, 86, 53, 89, 11, 59, 61, 58, 62, 42, 
13, 50, 34, 101, 4, 41, 60, 48, 33, 47, 26, 106, 56, 20, 35, 5, 100, 39, 2, 97, 46, 44, 37, 6, 54, 92, 85, 
16, 81, 102, 38, 28, 15, 73, 77, 19, 23, 70, 18, 22, 24, 14, 69, 80, 17, 27, 103, 36, 79, 98, 30, 74, 29, 32, 
99, 1, 75, 72, 12, 71, 52, 68, 25, 55, 7, 67, 105, 88, 43. b, The correlation polar angle 6 for each of the 
106 species (see Supplementary Table 2) can be interpreted as the average evolutionary relationship: the 
more the average correlation between protein length distributions is, the less the value of is; and the less 
the value of is, the closer the average evolutionary rtiationship is. 
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Figure 8: Supplementary Figure 4: The correlation polar angle and the evolutionary relationship. 

a, The correlation polar angle 6 and the auxiliary angle <fi. b, Distribution of three domains in the 
9 — r] plane. The evolutionary relationship can be reflected by the correlation between the protein length 
distributions. Species in different domains gather together in different areas respectively. 



28 



10 r 



CO 



Eubacteria 

* Archaebacteria 

♦ Eukaryotes 



* 



0.2 0.4 0.6 0.8 



10 r 



CO 



6 



Eubacteria 

* Archaebacteria 

♦ Eukaryotes 



0.2 



0.4 



0.6 



Figure 9: Supplementary Figure 5: Relationship between the genome size and the non-coding 
DNA content and the correlation polar angle, a, In ^increases when 9 decreases on the whole, b, In S 
increases when rj increases on the whole. 
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A node on the Boolean 
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Figure 10: Supplementary Figure 6: Explanation of prokaryotic complexity by the Boolean net- 
work Nl and its state space. Each node on the network Nl is one of ~ 20 L possible amino acid 
sequences, which has two states "on" or "off according to the theory of Boolean networks. Each point 
in the state space of Nl represents a "proteome" (a set of "proteins" as an attractor of the Boolean net- 
work Nl whose states are "on"). The attractor is robust against the perturbations in evolution. The 
evolution of a species can be described by a trajectory of the evolving proteome of the species in the 
state space of Nl- An underlying evolutionary mechanism is necessary to determine the movement of 
the species in the global state space of Nl, so the complexity of the life system is proportional to the 
number of points in the state space of Nl- The information stored in gene networks ( 20 L bits) re- 
flects the complexity of the life system, which is compatible to the maximum information stored in the 
observed universe I un iv 
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Supplementary Table 1: Organisms in the database Predictions 

for Entire Proteomes PEP 

Notes: There are 7 eukaryotes, 12 archaebecteria, 85 eubacteria and 2 viruses in PEP. 
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(No. 1) 

PEP FILE: achfl.pep 

ORGANISM: Acholeplasma florum (Mesoplasma florum); A (M) florum; achfl 
DOMAIN: Eubacteria 
(No. 2) 

PEP FILE: aciad.pep 

ORGANISM: Acinetobacter sp (strain ADP1); A sp ADP1; aciad 

DOMAIN: Eubacteria 

(No. 3) 

PEP FILE: aerpe.pep 

ORGANISM: Aeropyrum pernix Kl; A pernix Kl; aerpe 
DOMAIN: Archaebacteria 
(No. 4) 

PEP FILE: agrt5.pep 

ORGANISM: Agrobacterium tumefaciens (strain C58 / ATCC 33970); A tumefaciens; agrt5 
DOMAIN: Eubacteria 
(No. 5) 

PEP FILE: agrtu.pep 

ORGANISM: Agrobacterium tumefaciens; A tumefaciens; agrtu 

DOMAIN: Eubacteria 

(No. 6) 

PEP FILE: aquae.pep 

ORGANISM: Aquifex aeolicus; A aeolicus; aquae 

DOMAIN: Eubacteria 

(No. 7) 

PEP FILE: arath.pep 

ORGANISM: Arabidopsis thaliana; A thaliana; arath 
DOMAIN: Eukaryote 
(No. 8) 

PEP FILE: arcfu.pep 

ORGANISM: Achaeoglobus fulgidus; A fulgidus; arcfu 
DOMAIN: Archaebacteria 
(No. 9) 

PEP FILE: bacaa.pep 

ORGANISM: Bacillus anthracis (strain Ames); B anthracis_Ames; bacaa 

DOMAIN: Eubacteria 

(No. 10) 

PEP FILE: bacce.pep 

ORGANISM: Bacillus cereus (ATCC 14579); B cereus (ATCC 14579); bacce 
DOMAIN: Eubacteria 
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(No. 11) 

PEP FILE: bacsu.pep 

ORGANISM: Bacillus subtilis; B subtilis; bacsu 
DOMAIN: Eubacteria 
(No. 12) 

PEP FILE: bactn.pep 

ORGANISM: Bacteroides thetaiotaomicron VPI-5482; B thetaiotaomicron VPI-5482; bactn 
DOMAIN: Eubacteria 
(No. 13) 

PEP FILE: barhe.pep 

ORGANISM: Bartonella henselae (Houston-1); B henselae Houston-1; barhe 

DOMAIN: Eubacteria 

(No. 14) 

PEP FILE: barqu.pep 

ORGANISM: Bartonella quintana (Toulouse); B quintana Toulouse; barqu 
DOMAIN: Eubacteria 
(No. 15) 

PEP FILE: bdeba.pep 

ORGANISM: Bdellovibrio bacteriovorus; B bacteriovorus; bdeba 

DOMAIN: Eubacteria 

(No. 16) 

PEP FILE: borbr.pep 

ORGANISM: Bordetella bronchiseptica RB50; B bronchiseptica RB50; borbr 
DOMAIN: Eubacteria 
(No. 17) 

PEP FILE: borbu.pep 

ORGANISM: Borrelia burgdorferi; B burgdorferi; borbu 
DOMAIN: Eubacteria 
(No. 18) 

PEP FILE: borpa.pep 

ORGANISM: Bordetella parapertussis; B parapertussis; borpa 

DOMAIN: Eubacteria 

(No. 19) 

PEP FILE: borpe.pep 

ORGANISM: Bordetella pertussis; B pertussis; borpe 

DOMAIN: Eubacteria 

(No. 20) 

PEP FILE: braja.pep 

ORGANISM: Bradyrhizobium japonicum; B japonicum; braja 
DOMAIN: Eubacteria 
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(No. 21) 

PEP FILE: brume.pep 

ORGANISM: Brucella melitensis; B melitensis; brume 
DOMAIN: Eubacteria 
(No. 22) 

PEP FILE: bucai.pep 

ORGANISM: Buchnera aphidicola (subsp. Acyrthosiphon pisum); B aphidicola (subsp. 
Acyrthosiphon pisum); bucai 

DOMAIN: Eubacteria 

(No. 23) 

PEP FILE: bucap.pep 

ORGANISM: Buchnera aphidicola (subsp. Schizaphis graminum); B aphidicola (subsp. 
Schizaphis graminum); bucap 

DOMAIN: Eubacteria 

(No. 24) 

PEP FILE: bucbp.pep 

ORGANISM: Buchnera aphidicola (subsp. Baizongia pistaciae); B aphidicola (subsp. 
Baizongia pistaciae); bucbp 

DOMAIN: Eubacteria 

(No. 25) 

PEP FILE: caeel.pep 

ORGANISM: Caenorhabditis elegans; C elegans; caeel 

DOMAIN: Eukaryote 

(No. 26) 

PEP FILE: camje.pep 

ORGANISM: Campylobacter jejuni; C jejuni; camje 
DOMAIN: Eubacteria 
(No. 27) 

PEP FILE: canbf.pep 

ORGANISM: Candidatus Blochmannia floridanus; C Blochmannia floridanus; canbf 
DOMAIN: Eubacteria 
(No. 28) 

PEP FILE: caucr.pep 

ORGANISM: Caulobacter crescentus; C crescentus; caucr 

DOMAIN: Eubacteria 

(No. 29) 

PEP FILE: chlcv.pep 

ORGANISM: Chlamydophila caviae; C caviae; chlcv 

DOMAIN: Eubacteria 

(No. 30) 

PEP FILE: chlmu.pep 

ORGANISM: Chlamydia muridarum; C muridarum; chlmu 
DOMAIN: Eubacteria 
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(No. 31) 

PEP FILE: chlte.pep 

ORGANISM: Chlorobium tepidum; C tepidum; chlte 
DOMAIN: Eubacteria 
(No. 32) 

PEP FILE: chltr.pep 

ORGANISM: Chlamydia trachomatis; C trachomatis; chltr 
DOMAIN: Eubacteria 
(No. 33) 

PEP FILE: chrvo.pep 

ORGANISM: Chromobacterium violaceum ATCC 12472; C violaceum ATCC 12472; chrvo 

DOMAIN: Eubacteria 

(No. 34) 

PEP FILE: cloab.pep 

ORGANISM: Clostridium acetobutylicum; C acetobutylicum; cloab 
DOMAIN: Eubacteria 
(No. 35) 

PEP FILE: clope.pep 

ORGANISM: Clostridium perfringens; C perfringens; elope 

DOMAIN: Eubacteria 

(No. 36) 

PEP FILE: clote.pep 

ORGANISM: Clostridium tetani; C tetani; clote 
DOMAIN: Eubacteria 
(No. 37) 

PEP FILE: cordi.pep 

ORGANISM: Corynebacterium diphtheriae NCTC 13129; C diphtheriae NCTC 13129; cordi 
DOMAIN: Eubacteria 
(No. 38) 

PEP FILE: coref.pep 

ORGANISM: Corynebacterium efficiens; C efficiens; coref 

DOMAIN: Eubacteria 

(No. 39) 

PEP FILE: corgi. pep 

ORGANISM: Corynebacterium glutamicum; C glutamicum; corgi 

DOMAIN: Eubacteria 

(No. 40) 

PEP FILE: coxbu.pep 

ORGANISM: Coxiella burnetii; C burnetii; coxbu 
DOMAIN: Eubacteria 
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(No. 41) 

PEP FILE: deira.pep 

ORGANISM: Deinococcus radiodurans; D radiodurans; deira 
DOMAIN: Eubacteria 
(No. 42) 

PEP FILE: desvh.pep 

ORGANISM: Desulfovibrio vulgaris subsp. vulgaris str. Hildenborough; 
D vulgaris subsp. vulgaris str. Hildenborough; desvh 
DOMAIN: Eubacteria 
(No. 43) 

PEP FILE: drome.pep 

ORGANISM: Drosophila melanogaster; D melanogaster; drome 

DOMAIN: Eukaryote 

(No. 44) 

PEP FILE: ecoli.pep 

ORGANISM: Escherichia coli; E coli; ecoli 
DOMAIN: Eubacteria 
(No. 45) 

PEP FILE: entfa.pep 

ORGANISM: Enterococcus faecalis; E faecalis; entfa 

DOMAIN: Eubacteria 

(No. 46) 

PEP FILE: erwca.pep 

ORGANISM: Erwinia carotovora; E carotovora; erwca 
DOMAIN: Eubacteria 
(No. 47) 

PEP FILE: fusnu.pep 

ORGANISM: Fusobacterium nucleatum; F nucleatum; fusnu 
DOMAIN: Eubacteria 
(No. 48) 

PEP FILE: glovi.pep 

ORGANISM: Gloeobacter violaceus; G violaceus; glovi 
DOMAIN: Eubacteria 
(No. 49) 

PEP FILE: haedu.pep 

ORGANISM: Haemophilus ducreyi; H ducreyi; haedu 

DOMAIN: Eubacteria 

(No. 50) 

PEP FILE: haein.pep 

ORGANISM: Haemophilus influenzae; H influenzae; haein 
DOMAIN: Eubacteria 
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(No. 51) 

PEP FILE: halnl.pep 

ORGANISM: Halobacterium sp. (strain NRC-1); H sp. (strain NRC-1); halnl 
DOMAIN: Archaebacteria 
(No. 52) 

PEP FILE: hcmva.pep 

ORGANISM: Human cytomegalovirus (strain AD169); HCMV (strain AD169); hcmva 
DOMAIN: virus 
(No. 53) 

PEP FILE: helhe.pep 

ORGANISM: Helicobacter heilmannii; H heilmannii; helhe 

DOMAIN: Eubacteria 

(No. 54) 

PEP FILE: helpy.pep 

ORGANISM: Helicobacter pylori; H pylori; helpy 
DOMAIN: Eubacteria 
(No. 55) 

PEP FILE: human.pep 

ORGANISM: Homo sapiens; H sapiens; human 
DOMAIN: Eukaryote 
(No. 56) 

PEP FILE: lacjo.pep 

ORGANISM: Lactobacillus johnsonii; L johnsonii; lacjo 
DOMAIN: Eubacteria 
(No. 57) 

PEP FILE: lacla.pep 

ORGANISM: Lactococcus lactis (subsp. lactis); L lactis (subsp. lactis); lacla 
DOMAIN: Eubacteria 
(No. 58) 

PEP FILE: lacpl.pep 

ORGANISM: Lactobacillus plantarum WCFS1; L plantarum WCFS1; lacpl 

DOMAIN: Eubacteria 

(No. 59) 

PEP FILE: leixx.pep 

ORGANISM: Leifsonia xyli (subsp. xyli); L xyli (subsp. xyli); leixx 

DOMAIN: Eubacteria 

(No. 60) 

PEP FILE: lepic.pep 

ORGANISM: Leptospira interrogans (serogroup Icterohaemorrhagiae / serovar Copenhageni); 
L interrogans (serogroup Icterohaemorrhagiae / serovar Copenhageni); lepic 
DOMAIN: Eubacteria 
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(No. 61) 

PEP FILE: lisin.pep 

ORGANISM: Listeria innocua; L innocua; lisin 
DOMAIN: Eubacteria 
(No. 62) 

PEP FILE: lismo.pep 

ORGANISM: Listeria monocytogenes; L monocytogenes; lismo 
DOMAIN: Eubacteria 
(No. 63) 

PEP FILE: metac.pep 

ORGANISM: Methanosarcina acetivorans; M acetivorans; metac 
DOMAIN: Archaebacteria 
(No. 64) 

PEP FILE: metka.pep 

ORGANISM: Methanopyrus kandleri; M kandleri; metka 
DOMAIN: Archaebacteria 
(No. 65) 

PEP FILE: metth.pep 

ORGANISM: Methanobacterium thermoautotrophicum; M thermoautotrophicum; metth 
DOMAIN: Archaebacteria 
(No. 66) 

PEP FILE: mettm.pep 

ORGANISM: Methanobacterium thermoautotrophicum; M thermoautotrophicum ; mettm 
DOMAIN: Archaebacteria 
(No. 67) 

PEP FILE: mouse.pep 

ORGANISM: Mus musculus; M musculus; mouse 
DOMAIN: Eukaryote 
(No. 68) 

PEP FILE: muhv4.pep 

ORGANISM: Murine herpesvirus 68 strain WUMS; Murine herpesvirus 68 strain WUMS; 
muhv4 

DOMAIN: virus 
(No. 69) 

PEP FILE: mycav.pep 

ORGANISM: Mycobacterium avium; M avium; mycav 

DOMAIN: Eubacteria 

(No. 70) 

PEP FILE: mycbo.pep 

ORGANISM: Mycobacterium bovis AF2 122/97; M bo vis AF2 122/97; mycbo 
DOMAIN: Eubacteria 
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(No. 71) 

PEP FILE: mycga.pep 

ORGANISM: Mycoplasma gallisepticum; M gallisepticum; mycga 
DOMAIN: Eubacteria 
(No. 72) 

PEP FILE: mycge.pep 

ORGANISM: Mycoplasma genitalium; M genitalium; mycge 
DOMAIN: Eubacteria 
(No. 73) 

PEP FILE: mycms.pep 

ORGANISM: Mycoplasma mycoides (subsp. mycoides SC); M mycoides (subsp. mycoides 
SC); mycms 

DOMAIN: Eubacteria 

(No. 74) 

PEP FILE: mycpn.pep 

ORGANISM: Mycoplasma pneumoniae; M pneumoniae; mycpn 
DOMAIN: Eubacteria 
(No. 75) 

PEP FILE: mycpu.pep 

ORGANISM: Mycoplasma pulmonis; M pulmonis; mycpu 

DOMAIN: Eubacteria 

(No. 76) 

PEP FILE: neime.pep 

ORGANISM: Neisseria meningitidis; N meningitidis; neime 
DOMAIN: Eubacteria 
(No. 77) 

PEP FILE: niteu.pep 

ORGANISM: Nitrosomonas europaea; N europaea; niteu 
DOMAIN: Eubacteria 
(No. 78) 

PEP FILE: oceih.pep 

ORGANISM: Oceanobacillus iheyensis; O iheyensis; oceih 
DOMAIN: Eubacteria 
(No. 79) 

PEP FILE: porgi.pep 

ORGANISM: Porphyromonas gingivalis; P gingivalis; porgi 

DOMAIN: Eubacteria 

(No. 80) 

PEP FILE: pseae.pep 

ORGANISM: Pseudomonas aeruginosa; P aeruginosa; pseae 
DOMAIN: Eubacteria 
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(No. 81) 

PEP FILE: psepu.pep 

ORGANISM: Pseudomonas putida; P putida; psepu 
DOMAIN: Eubacteria 
(No. 82) 

PEP FILE: pyrab.pep 

ORGANISM: Pyrococcus abyssi; P abyssi; pyrab 
DOMAIN: Archaebacteria 
(No. 83) 

PEP FILE: pyrfu.pep 

ORGANISM: Pyrococcus furiosus; P furiosus; pyrfu 
DOMAIN: Archaebacteria 
(No. 84) 

PEP FILE: pyrho.pep 

ORGANISM: Pyrococcus horikoshii; P horikoshii; pyrho 
DOMAIN: Archaebacteria 
(No. 85) 

PEP FILE: ralso.pep 

ORGANISM: Ralstonia solanacearum; R solanacearum; ralso 

DOMAIN: Eubacteria 

(No. 86) 

PEP FILE: rhilo.pep 

ORGANISM: Rhizobium loti; R loti; rhilo 
DOMAIN: Eubacteria 
(No. 87) 

PEP FILE: riccn.pep 

ORGANISM: Rickettsia conorii; R conorii; riccn 
DOMAIN: Eubacteria 
(No. 88) 

PEP FILE: schpo.pep 

PEP FILE: SPBC839_05c ORG Schizosaccharomyces pombe; S pombe; schpo 

DOMAIN: Eukaryote 

(No. 89) 

PEP FILE: shifl.pep 

ORGANISM: Shigella flexneri; S flexneri; shift 

DOMAIN: Eubacteria 

(No. 90) 

PEP FILE: staau.pep 

ORGANISM: Staphylococcus aureus; S aureus; staau 
DOMAIN: Eubacteria 
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(No. 91) 

PEP FILE: strag.pep 

ORGANISM: Streptococcus agalactiae; S agalactiae; strag 
DOMAIN: Eubacteria 
(No. 92) 

PEP FILE: strco.pep 

ORGANISM: Streptomyces coelicolor; S coelicolor; strco 
DOMAIN: Eubacteria 
(No. 93) 

PEP FILE: strpn.pep 

ORGANISM: Streptococcus pneumoniae; S pneumoniae; strpn 

DOMAIN: Eubacteria 

(No. 94) 

PEP FILE: strpy.pep 

ORGANISM: Streptococcus pyogenes; S pyogenes; strpy 
DOMAIN: Eubacteria 
(No. 95) 

PEP FILE: sulso.pep 

ORGANISM: Sulfolobus solfataricus; S solfataricus; sulso 
DOMAIN: Archaebacteria 
(No. 96) 

PEP FILE: theac.pep 

ORGANISM: Thermoplasma acidophilum; T acidophilum; theac 
DOMAIN: Archaebacteria 
(No. 97) 

PEP FILE: thema.pep 

ORGANISM: Thermotoga maritima; T maritima; thema 
DOMAIN: Eubacteria 
(No. 98) 

PEP FILE: trepa.pep 

ORGANISM: Treponema pallidum; T pallidum; trepa 

DOMAIN: Eubacteria 

(No. 99) 

PEP FILE: ureur.pep 

ORGANISM: Ureaplasma urealyticum; U urealyticum; ureur 

DOMAIN: Eubacteria 

(No. 100) 

PEP FILE: vibch.pep 

ORGANISM: Vibrio cholerae; V cholerae; vibch 
DOMAIN: Eubacteria 
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(No. 101) 

PEP FILE: vibpa.pep 

ORGANISM: Vibrio parahemolyticus RIMD 2210633; V parahemolyticus RIMD 2210633; 
vibpa 

DOMAIN: Eubacteria 
(No. 102) 

PEP FILE: wolsu.pep 

ORGANISM: Wolinella succinogenes; W succinogenes; wolsu 
DOMAIN: Eubacteria 
(No. 103) 

PEP FILE: xanac.pep 

ORGANISM: Xanthomonas axonopodis (pv. citri); X axonopodis (pv. citri); xanac 

DOMAIN: Eubacteria 

(No. 104) 

PEP FILE: xylfa.pep 

ORGANISM: Xylella fastidiosa; X fastidiosa; xylfa 
DOMAIN: Eubacteria 
(No. 105) 

PEP FILE: yeast.pep 

ORGANISM: Saccharomyces cerevisiae; S cerevisiae; yeast 

DOMAIN: Eukaryote 

(No. 106) 

PEP FILE: yerpe.pep 

ORGANISM: Yersinia pestis; Y pestis; yerpe 
DOMAIN: Eubacteria 
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Supplementary Table 2: Data of 77, 9 and the comparison 
between theoretical predictions and experimental observations for 

genome size and gene number. 



Notes: The serial numbers for organisms here are the same numbers for the organisms in 
Supplementary Table 1. The data of non-coding DNA contents i] and the genome sizes are 
obtained from Ref. [T8j , where there are 54 species (6 eukaryotes, 5 archaebacteria and 43 
eubacteria, i.e., 48 prokaryotes in total) can be also found in database PEP. The gene numbers 
are obtained by the numbers of Open Reading Frames (ORFs) in proteomes in PEP. The non- 
coding content is obtained according to the Human genome draft in this table according to 
Ref. lfl~8l . But we choose the more precise value of rf according to the finished euchromatic 
sequence of the human genome in Ref. IfTTl! to calculate the accurate time of the Cambrian 
explosion. 
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No. 


77 


9 


genome size 


S(n,9) 


gene number 


N(n,e) 


1 




0.4960 






683 




2 




0.2571 






3322 




3 


0.1088 


0.4874 


1669695 


9.6490e+005 


2694 


839.3707 


4 


0.1170 


0.2173 


5674062 


4.705 le+006 


5402 


4.7732e+003 


5 




0.2238 






5274 




6 


0.0700 


0.3620 


1551335 


1.5559e+006 


1522 


1.7165e+003 


7 


0.7120 


0.2096 


115409949 


1.8103e+008 


25541 


1.8126e+004 


8 


0.0780 


0.2996 


2178400 


2.3277e+006 


2406 


2.5982e+003 


9 


0.1590 


0.2681 


5370060 


4.5484e+006 


5311 


3.7826e+003 


10 


0.1600 


0.2452 


546909 


5.2118e+006 


5274 


4.3859e+003 


11 


0.1300 


0.2428 


4214810 


4.4046e+006 


4099 


4.1737e+003 


12 




0.2617 






4776 




13 




0.3886 






1482 




14 




0.4112 






1141 




15 




0.2575 






3584 




16 




0.2649 






4986 




17 


0.0630 


0.4649 


1443725 


8.3102e+005 


850 


877.8284 


18 




0.2744 






4184 




19 




0.6183 






3446 




20 




0.1805 






8307 




21 


0.1300 


0.3146 


3294935 


2.9287e+006 


2059 


2.6414e+003 


22 


0.1640 


0.5060 


618000 


1.2136e+006 


574 


840.3921 


23 


0.1700 


0.5028 


640000 


1.2814e+006 


546 


868.7162 


24 




0.5092 






504 




25 


0.7419 


0.1945 


97000000 


2.3647e+008 


21832 


2.1291e+004 


26 


0.0570 


0.3441 


1641181 


1.5915e+006 


1633 


1.8700e+003 


27 




0.5105 






583 




28 


0.0940 


0.2471 


4016942 


3.4556e+006 


3737 


3.7569e+003 


29 




0.4177 






998 




30 




0.4590 






907 




31 


0.1110 


0.4107 


2154946 


1.5126e+006 


2252 


1.3753e+003 


32 




0.4502 






894 




33 


0.1100 


0.2228 


4751080 


4.3729e+006 


4396 


4.5420e+003 


34 


0.1200 


0.2442 


3940880 


4.1134e+006 


3847 


4.0489e+003 


35 


0.1690 


0.2695 


3031430 


4.7935e+006 


2722 


3.8301e+003 


36 




0.3379 






2373 




37 




0.2937 






2269 




38 




0.2943 






2947 




39 




0.2645 






2989 




40 


0.1100 


0.4060 


1995275 


1.5434e+006 


2009 


1.4133e+003 
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No. 


77 


9 


genome size 


S(n,9) 


gene number 


N(n,e) 


41 


0.0910 


0.2753 


3284156 


2.8916e+006 


3099 


3.1197e+003 


42 




0.3107 






3524 




43 


0.8100 


0.2562 


120000000 


2.5164e+008 


18358 


1.6650e+004 


44 


0.1220 


0.2247 


4641000 


4.6505e+006 


4281 


4.6032e+003 


45 


0.1200 


0.2852 


3218031 


3.2588e+006 


3145 


3.1186e+003 


46 




0.2226 






4463 




47 


0.1020 


0.3149 


2714500 


2.4678e+006 


2067 


2.4821e+003 


48 




0.2462 






4425 




49 




0.4102 






1715 




50 


0.1500 


0.3434 


4524893 


2.8080e+006 


1709 


2.2966e+003 


51 




0.3185 






2058 




52 




0.6795 






202 




53 


0.0700 


0.3495 


1799146 


1.6699e+006 


1874 


1.8581e+003 


54 


0.0920 


0.3633 


1643831 


1.7640e+006 


1564 


1.7844e+003 


55 


0.9830 


0.1889 


3.0000e+009 


1.0522e+009 


37229 


3.7131e+004 


56 




0.3399 






1813 




57 


0.1260 


0.3358 


2365589 


2.5342e+006 


2266 


2.2879e+003 


58 




0.2637 






3002 




59 




0.3320 






2023 




60 




0.2837 






3652 




61 


0.0970 


0.2748 


3011209 


3.0073e+006 


2968 


3.1706e+003 


62 


0.0970 


0.2622 


2944528 


3.2293e+006 


2833 


3.4342e+003 


63 




0.2999 






4540 




64 




0.3418 






1687 




65 


0.0800 


0.3228 


1751377 


2.0652e+006 


1873 


2.251 le+003 


66 




0.3222 






1869 




67 


0.9500 


0.1828 


2.5000e+009 


8.9214e+008 


28085 


3.5960e+004 


68 




0.8092 






80 




69 




0.2537 






4340 




70 


0.0900 


0.2451 


4345492 


3.4112e+006 


3906 


3. 772 le+003 


71 




0.5086 






726 




72 


0.1200 


0.5416 


580070 


7.5934e+005 


484 


609.2149 


73 




0.5674 






1016 




74 




0.4804 






686 




75 


0.0860 


0.4867 


963879 


8.4373e+005 


778 


802.6126 


76 


0.1710 


0.3407 


2184406 


3.2388e+006 


2065 


2.4452e+003 


77 




0.3436 






2461 




78 




0.2513 






3496 




79 




0.3800 






1909 




80 


0.1060 


0.2167 


6264403 


4.4184e+006 


5563 


4.6810e+003 
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No. 


n 


9 


genome size 


S(n,e) 


gene number 


N(ri,9) 


81 




0.2240 






5316 




82 




0.3236 






1764 




83 




0.3071 






2065 




84 


0.1320 


0.3851 


6397126 


1.9863e+006 


2064 


1.6935e+003 


85 


0.1270 


0.2242 


5810922 


4.8078e+006 


5092 


4.6687e+003 


86 




0.1953 






7264 




87 


0.1900 


0.5019 


1268755 


1.4540e+006 


1374 


912.3452 


88 


0.4250 


0.3018 


13800000 


1.8831e+007 


4987 


5.4215e+003 


89 




0.4419 






4176 




90 


0.1690 


0.2845 


2878084 


4.4024e+006 


2631 


3.4816e+003 


91 




0.3210 






2121 




92 


0.1110 


0.1809 


8670000 


5.5793e+006 


7894 


5.9409e+003 


93 




0.3949 






2094 




94 




0.3350 






1845 




95 




0.3006 






2977 




96 


0.1300 


0.3378 


1564905 


2.567 le+006 


1478 


2.2787e+003 


97 


0.0500 


0.3316 


1860725 


1.6375e+006 


1846 


1.9943e+003 


98 


0.1280 


0.4457 


1900521 


1.3744e+006 


1031 


1.1417e+003 


99 


0.0700 


0.5053 


751719 


6.8918e+005 


611 


688.9768 


100 


0.1255 


0.3336 


4034065 


2.5593e+006 


2736 


2.3187e+003 


101 




0.2561 






4800 




102 


0.0600 


0.3362 


2110355 


1.6949e+006 


2044 


1.9790e+003 


103 


0.1440 


0.2545 


5175554 


4.4875e+006 


4029 


3.9940e+003 


104 


0.1200 


0.4376 


2679305 


1.371 le+006 


2763 


1.1815e+003 


105 




0.3221 






6356 




106 


0.1420 


0.3265 


4653728 


2.9448e+006 


4087 


2.5139e+003 



46 



